initialisation scheme
referees correctly identified the main contributions of this work, namely: 1) a physically-inspired RNN generalising
We are grateful to the anonymous referees for their valuable comments and suggestions. A smoother introduction of equation (2) would be nice. To retain the PGD interpretation of the network, one can use truncated ReLus as activation functions. I believe that the experimental results section could have been larger, with more results to support the claims. The ninth page will comment further on the experimental results.
Approximate Gaussianity Beyond Initialisation in Neural Networks
Hirst, Edward, Ramgoolam, Sanjaye
Ensembles of neural network weight matrices are studied through the training process for the MNIST classification problem, testing the efficacy of matrix models for representing their distributions, under assumptions of Gaussianity and permutation-symmetry. The general 13-parameter permutation invariant Gaussian matrix models are found to be effective models for the correlated Gaussianity in the weight matrices, beyond the range of applicability of the simple Gaussian with independent identically distributed matrix variables, and notably well beyond the initialisation step. The representation theoretic model parameters, and the graph-theoretic characterisation of the permutation invariant matrix observables give an interpretable framework for the best-fit model and for small departures from Gaussianity. Additionally, the Wasserstein distance is calculated for this class of models and used to quantify the movement of the distributions over training. Throughout the work, the effects of varied initialisation regimes, regularisation, layer depth, and layer width are tested for this formalism, identifying limits where particular departures from Gaussianity are enhanced and how more general, yet still highly-interpretable, models can be developed.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Eigenvalue initialisation and regularisation for Koopman autoencoders
Miller, Jack W., O'Neill, Charles, Constantinou, Navid C., Azencot, Omri
Regularising the parameter matrices of neural networks is ubiquitous in training deep models. Typical regularisation approaches suggest initialising weights using small random values, and to penalise weights to promote sparsity. However, these widely used techniques may be less effective in certain scenarios. Here, we study the Koopman autoencoder model which includes an encoder, a Koopman operator layer, and a decoder. These models have been designed and dedicated to tackle physics-related problems with interpretable dynamics and an ability to incorporate physics-related constraints. However, the majority of existing work employs standard regularisation practices. In our work, we take a step toward augmenting Koopman autoencoders with initialisation and penalty schemes tailored for physics-related settings. Specifically, we propose the "eigeninit" initialisation scheme that samples initial Koopman operators from specific eigenvalue distributions. In addition, we suggest the "eigenloss" penalty scheme that penalises the eigenvalues of the Koopman operator during training. We demonstrate the utility of these schemes on two synthetic data sets: a driven pendulum and flow past a cylinder; and two real-world problems: ocean surface temperatures and cyclone wind fields. We find on these datasets that eigenloss and eigeninit improves the convergence rate by up to a factor of 5, and that they reduce the cumulative long-term prediction error by up to a factor of 3. Such a finding points to the utility of incorporating similar schemes as an inductive bias in other physics-related deep learning approaches.
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (4 more...)
Speech Modelling Using Subspace and EM Techniques
Smith, Gavin, Freitas, João F. G. de, Robinson, Tony, Niranjan, Mahesan
The speech waveform can be modelled as a piecewise-stationary linear stochastic state space system, and its parameters can be estimated using an expectation-maximisation (EM) algorithm. One problem is the initialisation of the EM algorithm. Standard initialisation schemes can lead to poor formant trajectories. But these trajectories however are important for vowel intelligibility. The aim of this paper is to investigate the suitability of subspace identification methods to initialise EM. The paper compares the subspace state space system identification (4SID) method with the EM algorithm. The 4SID and EM methods are similar in that they both estimate a state sequence (but using Kalman ters fil and Kalman smoothers respectively), and then estimate parameters (but using least-squares and maximum likelihood respectively).
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
Speech Modelling Using Subspace and EM Techniques
Smith, Gavin, Freitas, João F. G. de, Robinson, Tony, Niranjan, Mahesan
The speech waveform can be modelled as a piecewise-stationary linear stochastic state space system, and its parameters can be estimated using an expectation-maximisation (EM) algorithm. One problem is the initialisation of the EM algorithm. Standard initialisation schemes can lead to poor formant trajectories. But these trajectories however are important for vowel intelligibility. The aim of this paper is to investigate the suitability of subspace identification methods to initialise EM. The paper compares the subspace state space system identification (4SID) method with the EM algorithm. The 4SID and EM methods are similar in that they both estimate a state sequence (but using Kalman ters fil and Kalman smoothers respectively), and then estimate parameters (but using least-squares and maximum likelihood respectively).
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
Speech Modelling Using Subspace and EM Techniques
Smith, Gavin, Freitas, João F. G. de, Robinson, Tony, Niranjan, Mahesan
The speech waveform can be modelled as a piecewise-stationary linear stochastic state space system, and its parameters can be estimated using an expectation-maximisation (EM) algorithm. One problem is the initialisation ofthe EM algorithm. Standard initialisation schemes can lead to poor formant trajectories. But these trajectories however are important forvowel intelligibility. The aim of this paper is to investigate the suitability of subspace identification methods to initialise EM. The paper compares the subspace state space system identification (4SID) method with the EM algorithm. The 4SID and EM methods are similar in that they both estimate a state sequence (but using Kalman filters andKalman smoothers respectively), and then estimate parameters (but using least-squares and maximum likelihood respectively).
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (2 more...)